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Abstract 



Taken traditionally as a no-go theorem against the theorization of inductive pro- 
! cesses, Duhem-Quine thesis may interfere with the essence of statistical inference. 

This difficulty can be resolved by "Micro-Macro duality" \23\ [2l] which clarifies the 

■ importance of specifying the pertinent aspects and accuracy relevant to concrete 
contexts of scientific discussions and which ensures the matching between what to 
be described and what to describe in the form of the validity of duality relations. 

■ This consolidates the foundations of the inverse problem, induction method, and 
^ |. statistical inference crucial for the sound relations between theory and experiments. 

To achieve the purpose, we propose here Large Deviation Strategy (LDS for short) 
on the basis of Micro-Macro duality, quadrality scheme, and large deviation prin- 
^ . ciple. According to the quadrality scheme emphasizing the basic roles played by 

! the dynamics, algebra of observables together with its representations and univer- 

I sal notion of classifying space, LDS consists of four levels and we discuss its first 

^T) • and second levels in detail, aiming at establishing statistical inference concerning 

observables and states. By efficient use of the central measure, we will establish a 
quantum version of Sanov's theorem, the Bayesian escort predictive state and the 
widely applicable information criteria for quantum states in LDS second level. Fi- 
nally, these results are reexamined in the context of quantum estimation theory, and 
organized as quantum model selection, i.e., a quantum version of model selection. 



1 Statistical Inference vs. Duhem-Quine Thesis 

The main purpose of the present paper is to propose a general method for statistical in- 
ference which we call Large Deviation Strategy (LDS for short). To see the importance of 
this task, we first contrast it with the following famous dilemma of Duhem-Quine thesis. 
Duhem-Quine thesis: It is impossible to determine uniquely such a theory from phe- 
nomenological data as to reproduce the latter, because of unavoidable finiteness in number 
of measurable quantities and of their limited accuracy. 

According to the standard interpretation of this thesis as a no-go theorem against the 
possibility of theorizing inductive processes, the communities of sciences (and philosophy 
of sciences) have long been dominated by such common and/or implicit consensus that 
the inductive aspects can be treated only in intuitive and heuristic manners without being 
incorporated into theories where only deductive arguments can be developed from some 
tentative and ad hoc starting postulates without satisfactory bases. In this situation. 
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we would totally lose any sound basis for the mutual connections between experimental 
and theoretical sides, by which any attempts for statistical estimates and inference would 
become meaningless. While this issue is seldom taken serious by working scientists such 
as physicists, the reason still remains to be explained why experimental sciences can 
work in spite of this no-go theorem; this question cannot be answered by the present-day 
forms of sciences (nor by philosophy of sciences) in the standard formulation, for lack 
of the theoretical elements of induction. Since "Macro" from the standard viewpoint of 
microscopic physics is nothing more than rough approximations of "Micro" levels, such 
important theoretical roles played by it as universal reference systems or its origin are 
hardly examined, and hence, no justification can be given of the status of "Macro". In 
the light of the above Duhem-Quine thesis, therefore, it becomes evident not only that 
the sacred "Micro" theory itself in the usual approaches is just something postulated in 
an ad hoc way without any inevitable basis, for lack of the unique choices of theoretical 
starting points on the "Micro" side in relation to the "Macro" data, but also that the 
latter side is floating in the air without firm bases. 

In sharp contrast, the formulation based on "Micro-Macro duality" [24] proposed by 
one of the authors (I.O.) resolves the above conflict in a natural way, on the basis of 
the duality between the "Micro" side to be described and the "Macro" side to describe. 
Therefore, it is necessary for the essence of "Micro-Macro duality" to be discussed . 

1.1 Micro-Macro duality solving Duhem-Quine thesis 
and quadrality scheme 

The notion of dualities can be formulated mathematically in its general form as categor- 
ical adjunctions [18] materializing the important aspects of mathematical universalities. 
In this context, "Micro" and "Macro" are interrelated with each other by "Micro-Macro 
duality" in bi-directional ways: "Macro" playing the roles of a standard reference frame 
is generated as a stabilized domain through the processes of emergence [20] from the 
dynamical motions in "Micro". In the opposite direction, "Macro" "Micro", the 
extended machinery based on the so-called "dilation" method allows us to recover the 
original microscopic system, "Micro", from phenomenological and/or experimental data 
in "Macro" , by means of such generalizations of the inverse Fourier transform as Tannaka- 
Krein-Tatsuuma duality (TUl HZJ [301 121] and as Galois extensions materialized by crossed 
product formation In this way, the essence of the Micro-Macro duality can be un- 
derstood as the adaptations to natural sciences of the mathematical notion of duality (or 
adjunction) appearing ubiquitously in mathematics. 

What is most important here is the validity of mathematical universalities, which re- 
solves the difficulties caused by the no-go theorem of Duhem-Quine thesis in the following 
way. According to the thesis, we cannot avoid any kind of indeterminacy on the phe- 
nomenological "Macro" side based on the statistical inference, because of the inevitable 
finiteness in number of measurable quantities and of their limited accuracy, which will lead 
to possible non-uniqueness of the results of inductions in the form of a theoretical starting 
point of "Micro" extracted from the phenomenological "Macro" . Because of the univer- 
sality associated to "Micro-Macro duality", the duality between "Micro" and "Macro", 
the uniqueness of "Micro" is guaranteed within the context specified by the "Macro" in 
such forms as the relevant aspects and accuracies compatible with the phenomenological 
data. 

In the standard approach in physics concentrating on the unilateral efforts to derive 
experimental predictions from theoretical hypotheses on the purely "Micro" side, this 
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kind of approach might be unfamihar. So, we try to explain briefly the essence of some 
key notions relevant to duality. First, the notion of duality is widely applied in many 
mathematical contexts in such a form as the duality between an abstract group and (the 
totality of) its representations. One can also find duality in physics in such a form as 
position X and momentum p as it is essential for the basis of many concepts. The most 
typical example in the present context is the duality between observables and states. 
In the algebraic formulation of quantum theory, observables are defined as (self-adjoint) 
elements of a C*-algebra, and states (or, called also expectation values) as normalized 
positive linear functionals on the algebra of observables. A simple example of this sort is 
given by the well-known Gel'fand isomorphism between a commutative C*-algebra and a 
Hausdorff space as its spectrum. In more detail, denoting the categories of commutative 
C*-algebras and of Hausdorff spaces, respectively, by CommC*Alg and HausSp, we have 
the following isomorphic relations between the relevant morphisms in the two categories 
for 21 eCommC*Alg, M G HausSp, 

CommC*Alg{Ql, Co(M)) ~ HausSp{M, Spec{Ql)), (1) 

where Co(M) is the commutative C*-algebra consisting of functions on M vanishing at 
infinity, and Spec(Qi) := {x : 21 — )■ C | x- character satisfying xi^B) = x{A)x{B) for 
A,B E 21}. The isomorphism ~ is determined by the equality [v9*(a;)](y4) = [y9(74)](x) 
for a *-homomorphism : 2t — ?■ Cq{M) and its dual map (^* : M — )■ 5*^60(21). When 
M = Spec{^), this reduces to the identification, 21 ~ Co{Spec{Qi)), between an abstract 
commutative C*-algebra 21 and a concrete commutative C*-algebra Co(S'pec(2l)) of con- 
tinuous functions on 5*^60(21) through the relation 2t 9 A < — > A G Co{Spec{Qi)) defined 
by x(A) = A{x), X ^ 5'pec(2l). In this connection, a state as an expectation value can be 
shown just to correspond to a probability measure according to Markov-Kakutani theo- 
rem [S], which involves already such a statistical aspect as i.i.d. property as will be shown 
later. Other examples are given as follows: 

Example 1.1. A finite- dimensional vector space V is isomorphic with its second dual 

V = V**. (2) 

Example 1.2. Let G be a locally compact abelian group. Its dual group G defined by the 
set of unitary characters on G is also a locally compact abelian group w.r.t. the pointwise 

product. Furthermore, (G) =: G, called the second dual group, can be defined and the 
following relation, called Pontryagin duality, holds: 

G^G, (3) 

as topological groups. 

In statistics also, duality is known to play essential roles: the Riemannian geometric 
formulation of statistics is called information geometry [4j and based on a duality struc- 
ture. For any a G M, the a-connection is defined on the Riemannian manifold consisting 
of a family of probability distributions and has the dual connection corresponding to the 
(— a)-connection. The a-connection determines the unique quasi- distance of probability 
distributions, called an a-divergence, which generalizes the Kullback-Leibler divergence. 
Thus the duality is seen to play essential roles in various contexts. 

To consolidate the natural inter-relations between experiments and theories, and be- 
tween "Macro" and "Micro", we proceed further to a theoretical framework based on 
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the Micro-Macro duality. In terms of the four basic ingredients closely related to the 
representation-theoretical context of dynamical systems, a coherent scheme for theoreti- 
cal description of a target system of our recognition can be formulated as a pair of duality 
pairs, which we call a quadrality scheme [25] : 



Here, the dynamics (Dyn) at the bottom creates an algebra (Alg) of observables to char- 
acterize an object system, and the configurations or structures of the objects in (Alg) are 
described mathematically in terms the notion of states (as interfaces between Micro and 
Macro) and the associated (GNS-)representations of (Alg) which we denote by (Rep). 
Roughly speaking, "Micro" corresponds to a dynamical system consisting of (Dyn) and 
(Alg), and "Macro" to a (co)dynamical one of (Rep) and (Spec). In the direction from 
"Macro" to "Micro", we find two arrows, one from the experimental side to the theo- 
retical one in the form of induction processes, and another in the operational contexts 
of controls over the system under consideration, which should include the aspect of the 
state preparation indispensable in conducting experiments. The former one, induction, 
is materialized usually on the basis of statistical inference, in combination with suitable 
choices of classification schemes. The aspects of control theory and state preparation are 
strongly interrelated. 

To materialize an induction scheme, we should combine the large deviation principle 
(LDP for short) as the mathematical core of statistical inference with the above quadrality 
scheme in view of its essential roles in implementing "Micro-Macro duality" indispens- 
able for overcoming the Duhem-Quine no-go theorem. From this viewpoint, we propose 
in Section 2 Large Deviation Strategy as a systematic method of induction, where the 
importance of statistical inference is emphasized. Here statistical estimation is no more 
than the method to analyze several ingredients such as means, probability distributions 
and coefficients of stochastic differential equations, and is fundamentally based on LDP 
extended by the quadrality scheme. Stein's lemma and Chernoff bounds in hypothesis 
testing are the typical examples in this context. All these discussions explain the rea- 
son why we adopt such naming as Large Deviation Strategy. After briefly mentioning 
in Section 3 two example cases of the application of LDS, we clarify in Section 4 the 
meaningful and precise relations between quantum and classical levels in the context of 
estimation theory, especially concerning the problem of model selection. In Appendix, 
the operational meaning of Tomita's theorem of barycentric decomposition crucial for the 
second level of LDS is explained from the viewpoint of a measurement process. In this 
way, the theoretical bases of LDS can be found in Micro-Macro duality [21], quadrality 
scheme and LDP extended by the quadrality scheme. Before going into LDS, it will be 
instructive to explain the mutual relations among the relevant tools: 

1.2 Interdependence among statistical inference, Micro-Macro 
duality, quadrality scheme and LDS 

The logical relations among the three items including LDS itself is actually a kind of 
mutual interdependence in the following sense: 

i) statistical inference =^ Micro-Macro duality: as Micro-Macro duality is based on 



Macro 

2. States (State) and 
Representations (Rep) 



3. Spectrum (Spec) 



1. Algebra (Alg) . 



4. Dynamics (Dyn) 



Micro 
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the duality between the inductive and deductive arguments, it is not possible without the 
reliable methods and schemes for statistical inference. 

ii) Micro-Macro duality =^ quadrality scheme: The duality between (Alg) and (Rep) 
guarantees the matching between what is to be described in (Alg) and what to describe by 
(Rep), which is just the most important step to resolve the difficulties caused by the no- 
go theorem of Duhem-Quine thesis mentioned at the beginning. To attain a meaningful 
interpretation from the items obtained above, we need to apply some classification to the 
states and representations in (Rep) according to some relevant viewpoints, as a result 
of which we can attain the level of (Spec) containing all the classifying parameters to 
specify each configuration realized in (Rep). Then the validity of duality between (Spec) 
and (Dyn) allows a universal parametrization of the dynamics, (Dyn), of the object system 
in terms of the parameters belonging to (Spec), whose special case can be found in the 
familiar parametrization of dynamical map t i — > at in terms of a time parameter t G M. 

iii) quadrality scheme =^ (an extended form of) LDP: the standard application of 
LDP starts from the calculation of a rate function to measure deviations of empirical data 
of an observable from its "true" value (of its average), which is sometimes called the LDP 
at the first level [9]. In view of ii) above, this corresponds to discussing (Alg) (or, more 
precisely, a subalgebra of (Alg) generated by the specific observable under consideration). 
For the purpose of statistical inference, however, what is most relevant is that of such a 
state as generating a certain definite pattern of empirical data, like the case of a quantum 
state yielding a statistical ensemble allowing the Born formula. This requires us to proceed 
from (Alg) to (Rep) in the quadrality scheme in ii) in the context of LDP, which constitutes 
the main contents of Sec. 2.3. We try further to extend the scheme to incorporate the 
level of (Spec) which enables us to deform and adjust the choice of model spaces in an 
optimal way and which we call the LDP third level. Once this is achieved, we can further 
proceed to the inference of the dynamical law (Dyn) of the system under consideration, 
taking advantage of the duality between (Spec) and (Dyn), which can be called the LDP 
fourth level. 

iv) extended LDP =^ LDS: LDS can be obtained by applying the above extended 
scheme of LDP to the context of statistical inference, by means of which we can attain 
a full-fiedged form of the latter, and hence, we can re-start i). This loop structure can 
be easily organized into a helical or spiral form to deepen the levels of our theoretical 
descriptions. 

2 Large Deviation Strategy 

2.1 What is Large Deviation Strategy? 

Now, our Large Deviation Strategy (LDS) is a method of statistical inference by step- 
by-step inductions according to the basic idea constituting the large deviation principle 
(LDP). We suppose that LDS consists of the following four levels just in parallel with 
LDP in its extended form: 

1st level : Abelian von Neumann algebras 

Gel'fand representation. Strong law of large numbers(SLLN) 
and statistical inference on abelian von Neumann algebras 

2nd level : States and Reps 
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Measure-theoretical analysis for noncommutative algebras 

3rd level : Spec and Alg 

Emergence of space-time from composite systems 
of internal and external degrees of freedom 

4th level : Dyn From emergence to space-time patterns and time-series analysis 

The aim of the first level is to estimate the spectra of observables and their probabil- 
ity distributions. If we restrict our attention to mutually consistent observables, this is 
equivalent to considering the problem to estimate a spectrum of abelian von Neumann 
suhalgehra generated by the mutually commuting observables. The obtained information 
at this stage should help us to restrict the class of states and representations relevant to 
the second level, the latter of which aims at the estimate of states and the associated 
representations defined on the algebra of all observables. To proceed to the third level, 
we consider a composite system consisting of the object system to be described and of the 
macroscopic degrees of freedom arising from the processes of emergence from the micro- 
scopic ones. At the fourth level, we consider the estimate of the dynamics of the system 
which will allow us to proceed to the stage of controlling the object system. The following 
methods will play central roles in LDS: 

I. Large deviation principle [71[9]: 

From probablistic fluctuations to statistical inference 

II. Tomita decomposition theorem and central decomposition: 

To formulate and use state-valued random variables 

III. The dual G of a group G and its crossed products: 

To reconstruct Micro from the data of Macro 

IV. Emergence: Condensation associated with spontaneous symmetry 
breaking (SSB) and phase separation in the direction from Micro to Macro 

LDP works effectively at each level of our strategy and provides us with the information 
of rate functions in such forms as free energy and relative entropy, for instance. This 
information clarifies to which extent a given quantity in question can deviate from its 
fiducial point which is called the "true" value. In this way, LDP is seen to be essential for 
statistical inference. As discussed in Sec. 2. 3, the notion of state-valued random variables 
can succesfully be formulated in the use of Tomita decomposition theorem and central 
decompositions. In the second level where states and representations are estimated, this 
formulation enables us to analyze them in terms of "numerical" data. We can also see the 
necessity of the items in the above III and the processes of emergence in the third level 
(in reference to [26] and to the discussion in the previous section). 
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2.2 1st Level: Observables and Abelian Subalgebra 

As stated in the previous subsection, we discuss here the mean and the probabihty dis- 
tribution of an observable. Let 21 be a C*-algebra, -ip he a state (defined as a normahzed 
positive hnear functional) on 21 and A be an observable to be measured which is identified 
by an element of 21. A denotes the abelian subalgebra of 21 generated by A, and states 
on 21 is naturally restricted to A. Therefore, we try to estimate the appropriate pair 
{Ajiplyi). The candidate of A comes from the following theorem. 

Theorem 2.1. An abelian von Neumann algebra DJt on a separable Hilbert space Sj is 
generated by one element X (belonging to Dyt). 

If X is selfadjoint, then we put A = X. 

For an abelian von Neumann algebra A and ifj a normal state on A, the following 
relations hold: 

{n^,7i^{A)n^) ='ip{A) = J A{k)du4k), 

(% 3 < — > 1 e L'^{K, u^)), 
A,^L\K,iy^), 

where i^' is a compact Hausdorff space and z/^ is a Borel measure on K. Every self-adjoint 
element vr^(v4) of 7i^,{A) is treated as measure-theoretical M- valued random variable A. 
Thus, we can discuss spectra of observables in the commutative case. 

For any k = {ki, /c2, ■ ■ • ) G and A = A* e A, we define Xj{k) = kj and Aj{k) := 
A{Xj{k)), we see the validity of 

Matching Condition 1. {Aj} are independent identically distributed ("i.i.d.") random 
variables. 

For any measure m, let Pm '■= itl^ denote the product measure of m defined by a 
countably many tensor power. The following theorem is known to hold: 

Theorem 2.2 (Cramer's theorem [7]). Let Mn{k) := -{Ai{k) H h i„(A;)) andQl^\T) 

:= Pj,^{Mn G r). Then, Qn '' satisfies LDP with the rate function I^{a) = sup{at — c^{t)} 

(^c^{t) = log j e*"^i/^(i G dx)^ : 

- inf Ua) < liminf-logQ«(r) 

< lim sup - log Q« (F) < - inf (a) (4) 

n—^oo ^ asr 

By this theorem, we can discuss the convergence rate of the arithmetic means of 
observables and estimate "true" means. 

As the next step, we give a satisfactory formulation for estimating probability distri- 
butions. 
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Definition 2.1. A family of probability distributions {p{x\w)\w G W} (with a compact 
set W C M!) is called a (statistical) model if it satisfies the condition that the set 
{x G M'^|p(x|w) > 0} is independent of w & W . 

Definition 2.2. The probability distribution p^^ ^ji{x\x^) defined below is called a Bayesian 
escort predictive distribution: 

p{x\w) '^^p{xj\wYTt{w)dw 

P^AA^"") = {p{Aw))T,p = ^nr^ ' (5) 

JJ^ p{xj \wYT^{w)dw 
i=i 

where ti{w) is a probability distribution (p.d., for short) on W and /3 > 0. 

We denote by Mi(S) the space of Borel probability measures on a Polish space E. 
We define the relative entropy of the probability measure v G Mi(S) with respect to 
/i G Mi(S) as 

D{uM = [ j ^^^P^^'^^t^^P^ ^^<^P^ (6) 
I +00 (otherwise). 

If there exists a probability measure a G Mi(S) such that z/, yU ^ cr, D{i'\\^) is also 

denoted by D{q\\p) where q ■= -j— and p := 

Theorem 2.3. For r{-\x"')as a p.d. -valued function = {xi, ■ ■ ■ , x„} (-> r(-|x"), its risk 
function 7?."(p||r) defined by 

7e"(p||r)= f f D{p{-\w)\\r{-\x''))f[p{xj\wfdxj7T{w)dw (7) 

is minimized by the Bayesian escort predictive distribution p-wAA^^)- 

Proof. See P 133]. □ 

While there are some more items to be treated in statiscal inference, those appearing 
in the next subsection are essentially all what we need in the second level. 



2.3 2nd Level: States and Representations 

In order to proceed to the second level where states of the algebra of observables are the 
target to be evaluated, we need to prepare certain advanced operator-algebraic setting 
which is provided by Tomita's theorem of integral decomposition of states. For the pur- 
pose, we first review the notion of sectors. For a C*-algebra 21 let E% be the set of its 
states defined by normalized positive linear fuctionals on 21. A state u G E% is called a 
factor state if the von Neumann algebra vr(^(2l)" corresponding to the GNS representation 
{^uj^T^ui} is a factor with a trivial center: 3ij(2t) := vr(^(2t)" fl 7r^(2t)' = Cl/,^. We denote 
by Fa the set of all factor states of 21. If vr is a representation of 21, then a state of 21 
is said to be vr-normal if there exists a normal state p of 7r(2l)" such that 

uj{A) = p{niA)) (8) 

for all y4 G 21. Two representations tti and 7r2 of a C*-algebra 21 are said to be quasi- 
equivalent and written as tti ~ 712, if each vri-normal state is 7r2-normal and vice versa. 
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Definition 2.3 ([23]). A sector of C* -algebra 21 is defined by a quasi- equivalence class of 
factor states of 21. 

If {vr,i^} is a representation of a C*-algebra 21, and n is a cardinal, let nvr denote the 
representation of 21 on i)®" = 0^=]^ defined by 

(n \ n 

=®{'^mk)- (9) 
k=l / k=l 

By the following standard theorem in the representation theory of operator algebras, 
quasi-equivalence between two representations tti and Ti2 can be seen as the isomorphism 
between the corresponding von Neumann algebras, 7ri(2t)" and 7r2(2l)", or as the unitary 
equivalence of tti and 112 up to multiplicity: 

Theorem 2.4 (see [5]). Let % be a C* -algebra and let {7ri,i^i} and {7^2-, ^2} be nonde- 
generate representations of 21. The following are equivalent: 

(1) Tll^ 712,- 

(2) there exists an isomorphism r : 7ri(2t)" 1— ?■ 7r2(2l)" such that t{'Ki{A)) = 7r2(v4) for all 
A G 21; 

(3) there exist cardinals n, m, projections E G rniiiQl)' , F G 72712 (21)' O'lT'd unitary elements 
f/ : fli FiSj®""), V -.^^2^ ^(^iD such that 

U7ri{A)U* = rmT2{A)F, 
VTT2iA)V* = mTi{A)E 

for all A G 2t; 

(4) There exists a cardinal n such that mxi = mi2, i.e., tti and tt2 are unitary equivalent 
up to multiplicity. 

The Gel'fand spectrum S'pec(3<.j(2l)) of the center 3a; (21) is then identified with a factor 
spectrum 21 of 21: 

S'pec(3aj(2t)) = 21 := F2t/~ : factor spectrum. 

The center 3ij(2l) and the factor spectrum 21 play the role of the abelian algebra of 
macroscopic order parameters to specify sectors and the classifying space of sectors to 
distinguish among different sectors, respectively [23] . 

As already mentioned, we need to treat states as objects to be estimated in the sec- 
ond level of LDP, which means the necessity for "states to be treated as observables" . 
The notion of state-valued random variables required for this purpose can succesfully be 
formulated in the use of Tomita's theorem for orthogonal decompositions of states by 
barycentric orthogonal measures whose special case of central decompositions [22| 123] is 
seen to be particularly useful for our purposes of statistical inference of state estimate. 
Now the orthogonality Ui _L U2 of positive linear functionals tUj G 21^ and the orthogonal 
measures /i on the state space E<^ of a C*-algebra 21 are defined, respectively, as follows 
(see |5j): 

Definition 2.4. If Ui,U2 G satisfy any of the following three equivalent conditions, 
they are said to be orthognal and we write ui _L 002 : 
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1. if u' < Ui and u' < U2 for u' G 2t!j. then uj' = 0; 

2. there is a projection P G 7ri^(2l)' s.t. a;i(y4) = {PQi^,'Ki^{A)Q^) and Ci;2(^) = ((1 — 

3. the representation associated to w = wi + is a direct sum of the representations 
associated with ui and ijJ2, 



■^2 • 



Definition 2.5. ^4 positive regular Borel measure fi on Ef^ is defined to he an orthogonal 
measure on E<^ if it satisfies for any Borel set S C -Ea the condition 

pdM^ ± (^j^^^^pd^y (10) 

The important properties characteristic to these notions can be found in the following 
theorem due to Tomita: 

Theorem 2.5 (Tomita's theorem, see [5]). Let 01 be a C* -algebra and u be a state on 21. 
There exists one-to-one correspondence between the following three sets: 

(1) the orthogonal measures p with bary center to = / pdp(p); 



(2) the abelian von Neumann subalgebras 05 C 7raj(2l)'; 

(3) the orthogonal projections P on Sj^j such that 

Pn^ = n^, Pn^i^)P C {P7r^(2l)P}' 

// p, ®, P are in correspodence one has the following relations: 
(1) 53 = {n^Ql) U P}'; (2) P = [53^]^]; 

(3) p{AiA2---An) = {n^,n^{Ai)Pn^{A2)P ■ ■ ■ P7r^{A„)n^)- 

(4) is *-isomorphic to the range of the map L°°{p) := L'^{E%,p) 3 f ^ i^t^if) ^ '^uji'^)' 
defined by 



{n^,K,{f)7TUA)n^)= f{p)A{p)dp{p) (11) 

J 

and for A, P G 21 

K^{A)7T^{B)n^ = n^{B)Pn^{A)n^, (12) 

where the map 21 3 A 1 — j- A G L°°{p) is defined by A := {E% 3 p 1 — > p{A)). 

The above measure p is called a barycentric measure of the state which is, in turn, 

called the barycenter uj = h{p) := p dp{p) of p. The set of orthogonal probability 

measures p on £^21 with barycentre u is denoted by (9t^(Pa). In reference to the abelian 
von Neumann algebra 05, we also denote the measure p by p-^. We add here the following 
observation to extend the essential contents of the Gel'fand isomorphism for commutative 
C*-algebras to the non-commutative situation: the image 2t := {^41^4 G 21} of the map 
21 3 A I — > A G L°°{p) is contained in the universal enveloping von Neumann algebra 
21** of 21 and constitutes a C*-algebra of measure-theoretical random variables equipped 
with a linear structure {aA + (3B){p) := {aA + (3B){p) {a, f3 G C), a non-commutative 
convolution product defined by {A * B){p) := AB{p), and the norm || ■ || given by 

= sup \A{p)\. (13) 

IIpINi 
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Definition 2.6. If the algebra OS corresponding to fj, is a subalgebra of the center 

of the GNS representation tt^^ ofou, the orthogonal measure fj, — /is^ ^ Ow{E^ is called a 

subcentral measure ofou satisfying the condition that, for any Borel set A C E^, the pair of 

subrepresentations, I TTp diJL{p) and I tt^ diJL{p), ofn^ are disjoint in the sense of the 

J A Je<h\A 

absence of non-zero intertwiners. In the case of^ — 3^(21), the corresponding subcentral 
measure is called the central measure of uj and denoted by :— /^3„(2i) G Oi^{E<^). 

Since k,^^ is a ^-algebraic embedding of L°°(/i), we can define a projection- valued 
measure (PVM) : (QS(supp /x^) 9 A E^{A) e Proj{Xi^))) on Borel subsets 
A e !B(supp 11^) of the state space £^21 by E^{A) := Kf,^{xA) e -P'^oj(3a;(2l))), which 
satisfies 

{n^,E^{A)n^) = i2^{A). (14) 

Here the indicator function xa for a subset A of E'gi is defined as usual: 



Xa(p) 



1 (peA), 
(p^A). 



In this way, states p on supp(yUtj) constitute a random variable on the central spectrum, 
and each element k^^(/) G Kn^{L°°{p^)) = 03 = 3a; (21) can be expressed as 

^,M) = J f{p)dEM- (15) 

Therefore, the center 3w(2l) of 21 can be seen as an algebra consisting of non-linear func- 
tions of states. 

When the methods discussed in this section are applied to practical situations, it will 
be safe and also sufficient for us to restrict ourselves to such cases that the support of the 
barycentric measure p^^ is a compact subset B in the factor spectrum of 21: 



where {p^|^ G S : an order parameter} C F<^. Here the factor spectrum F<n of 21 means the 
subset of the state space E^ consisting of all the factor states ip whose GNS representations 
have trivial centers: 3^.(21) = 7r^(2l)" n 7r^(2t)' = Cl^^. 

2.3.1 Mathematical and statistical basis 

Let 21 be a separable C*-algebra and -0 be a state on 21. Then i^a is weak *-compact and 
metrizable by the metric 

where the set {Aj e 2l|ylj 7^ 0, j = 1, 2, • • • } is a dense subset of E'^. Thus supp p^ of 
the central measure p^ of ip E E<^ is compact in the weak *-topology and (supp p^)^ 
is also compact by Tikhonov's theorem. For p = {pi, P2, ■ ■ ■) ^ (supp /i.^)^, wc define 
Yj{p) = pj. Each pj is a factor state because p^ is supported by the closed subset of Fa, 
the set of factor states. Then {Yj}'jLi is seen to be a set of (supp /x^)-valued random 
variables satisfying the following condition: 
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Matching Condition 2. {Yj} are independent identically distributed ("i.i.d.") random 
variables. 

We denote by Mi(Il) the space of Borel probabihty measures on a Pohsh space E, and 
by -B(E) the vector space of all bounded Borel measurable functions on E, respectively. 

For G 5(S), let : Mi(S) M be defined by r<^(z/) = (0, z/) = / du. We denote by 

Bcy{Mi{Ti)) the a- field of cylinder sets on Mi(S), i.e., the smallest cx-field that makes all 
{t^} measurable (see [7J). 

For any p G (supp /x^)^, A G i3(supp /i^) and F G ;Bcy(Mi(i?2i)), we define the 
empirical measures 



and 

n(2)/'r^ — I 

The next theorem [H] is the key to proving LDP 



(F) = P^ rL„GF). (19) 



Theorem 2.6 (HOT83). Lei /x, z/ be regular Borel probability measures on E% with 
barycenters uj,ip & i?2i- If there is a subcentral measure m on E% such that p.v <^ m, 
then S{iIj\\uj) = D{u\\fi). 

This theorem enables us to evaluate the quantum relative entropy S{ip\\u) as the 
measure-theoretical relative entropy Z}(z/||yu). 

(2) 

Theorem 2.7. Let be a separable C* -algebra and be a state on 21. Then Qn satisfies 
LDP with the rate function S{b{- 



- inf ^(6(^^)11^) < liminf-logg(f)(F) < lim sup - log Q^f ^ (H < - Jnf S{biu)\\iP) 

(20) 

for any F G Bcy{Mi{E^)). In the case that, for F G Bcy{Mi{E^)), {u G T°\u < fi^} 
and {v G F|z/ ^ ji^} are empty, inf S{b{p)\\%lj) and Jnf S{b{p)\\%lj) are defined as 

infinity, respectively. 

Proof. (E'jp d) with the metric d defined by (fT7|l is a compact metric space, so is a Polish 
one. Therefore, we can apply Sanov's theorem [7] for Qn (T) to prove this theorem by 
using Theorem EJ] (HOT83) . □ 

In Appendix A a generalization of this theorem will be discussed, but we should first 
justify the use of generic barycentric measures in the context of statistical inference. 

Definition 2.7. A family of states {ujg\6 G 0} parametrized by a compact set <d in Mf^ is 
called a (statistical) model if it satisfies the following three conditions: 

[i) There is a subcentral measure m on E% such that <^ m for every 6 E Q. 

(ii) The set |p G E<^ pg := ^^^(p) > oj- is independent of 6 eQ. 
{Hi) OJg is Bochner integrable. 
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Definition 2.8. For a given model {u}o}oi=q, a probability distributions tt{9) of 9, a state 
uj^^p defined by 



U0 



<B —r^ (21) 



is called a Bayesian escort predictive state, where p" := {pi, ■ ■ ■ , pn} and /3 > 0. 

We obtain the following important theorem: 

Theorem 2.8. For (pp^ as a state-valued function p" = {pi, • • • i->- 0''" e E^n, its risk 
function T"'(a'g||0^") defined by 

T^iueW") -.= ^11 S{uo\\rl flpo{p,fdm{pMO)d9, (22) 

/ / flPoiP^rdr 
J J j=i 

is minimized by the Bayesian escort predictive state <^^^^- 

Proof. For any measure p = /i''" on E^^ with (f)''" as its bary center such that p <^ m, we 
have 



^ / / I I Pe{Pjrdm{pj)7r{e)de, 



- fdm^-^( 
J dm \ 



log^-log^), 
am am 



and hence, 

A(T«(c^,||0f)-T"(a;,||0f)) 




i=i 



^ // *" (/ '-t n-»(-^)^-('')<*') ('°^ - n 

for any <p'l = b{fi'l ),02 = b{P'2 ) ^ -^st such that Pi,P2 ^ Now we put 

T := p., f[p9(p,)^7r(^)d^ with ^ := y nPe(p,)^7r(^)(i^. If ^ is equal to 



then the above equalities continue to the following form: 



ffD{r\\nC)f[dm{pj) 
J J j=i 

^B JJs{b{T)Uf)f[dm{pj)>0 



for any 0^" e £'21. Therefore, the risk function T"'{oje\\(f)''"') is minimized at the unique 
state (pp" = 6(t) = 0;^?^^. □ 
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This theorem is a generahzation of [T] and [29], and explains the reason why the 
Bayesian escort predictive state is a good estimator for a "true" one. 

Now we discuss the situations with singular statistics. The results here are proved 
originally in EH ES] • The reason why we use this method will be explained in Section 
4. For a model {wgjege and a "true" state ip ^ -^21 we assume that there is a subcentral 
measure m satisfying n^^^.n^ <^ m and 



P e E.2 



Pip\0) ■■= Mp) = %(P) > 0| = (p G E,, q{p) := ^(p) > 



dm 

for every 6 E Q, and we consider 



m:=- J dmip)qip)\ogpip\e). (23) 
We assume that there exists at least one parameter 6' G that minimizes L[6), 

Lo = mmL(e) (24) 

See 

and that po{p) := p{p\Oo) is one and the same density function for any ^ ©o ^ = ^ 
Q\L{9) = Lq}, and we put Uq := ug^. Then, from such definitions as 

/(p..):^.ogi^. (25) 



D{e):= J dmip)qip)fip,e), (26) 
1 " 

Dni9):=-J2fiPj,0), (27) 



n . 



it immediately follows that 



D{e) = Sm^g)-Smu,), (28) 
Dn{e) = Snin^e) - Snin^^). (29) 



1 " f ) 

where Sniip^i^e) = — / log^ — tttt- Therefore, we see DiO) > 0. 

Assumptions. (1) The open kernel Q° of the set O of parameters 6 is non-empty. The 
boundary of is defined by real analytic functions gj{9) so that 

e = {ee R''\gi{e) > O, g2{0) > O, ■ ■ ■ , g^ie) > 0}. (30) 

(2) The a priori distribution it (9) is factorized into the product, 

7c{e) = n,i9)n,i9), (31) 

of a real analytic function 7Ti{6) > and of a function of C°°-class 712(6*) > 0. 

(3) The map Q 3 6 f{p, 6) is an L''(g)- valued analytic function, where L'^{q) with 
s > 6 is defined by 



L'{q) ■■= f{p) 



l/s 

\f{p)U{p)dm{p)\ <ooJ>. (32) 
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(4) There is an e > such that 



[ (snp\fip,e)A I sup pip\e)] dm{p) < oo. (33) 

J \6»G0 / \Die)<e ) 

The pair [i/JjOJo) is said to be coherent if there exist A > and e > such that 

ee&,^ Smooe) - Smooo) > A ■ S{ujo\\ooe), (34) 

where 9^ := {9 E Q \S{ijJo\\ujg) < e} and, otherwise, incoherent. 

In the rest of this subsection, we assume the following condition to hold, in addition 
to the above assumptions (1) - (4): 

Matching Condition 3. The pair [ip, ug) satisfies the coherence condition. 

We note that the validity of these assumptions means the interplay between positiv- 
ity and analyticity which is closely related with the modular structure inherent in the 
standard form of a von Neumann algebra. 

The inequality in (34) can be written as follows: 

J dm{p)q{p) log^^ > A ■ D{po\\pe) (35) 

for every ^ G 0^, and the following inequality holds: 

t + e~* - 1 > B{ri)t'^ (36) 

for \t\ < 7], where B{ri) is a monotone decreasing strictly positive function of r/ > 0. Thus, 
by fixing rj sufficiently large, it holds that 

m -Lo>C J dm{p)po{p)f{p, ef (C > 0), (37) 

for every 9 G O^. We can prove the next theorem by using the same methods as in [35]: 

Theorem 2.9. By the resolution of singularities, the functions in Eqs. [2B^) . ^25^, (21) 
can he reduced to the following "standard forms" : 

D{g{u)) = u'^ = uf^...uf^, (38) 
f{p,g{u)) = a{p,u)u\ (39) 

DMu)) = u'^-^u^Uu), (40) 



where u = (ui, ■ ■ ■ ,Ud) is a coordinate system of an analytic manifold U , and g is an 
analytic map from U to 9, fci, ■ ■ ■ , fc^ are non-negative integers, a{p,u) is an analytic 
function on U for each p G supp /i^^ such that Ep[a{p,u)] = , and {^n} is an empirical 
process such that 

1 

e„(M) = y^HPv - w''}' (41) 

7=1 



^31 

3 



converges weakly to the Gaussian process ^{u) with expectation E^[^{u)] = and covari- 
ance E^[C,{u)C,{v)] = Ep[a{p,u)a{p,v)] — u^v^ . 
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The universal validity of the above standard forms of the quantum relative entropy 
and of the log likelihood ratio function, respectively, is guranteed independently of the 
model by this theorem. 

Furthermore, we need the following theorem proved in [33], as a next step to Theorem 
31 under the above assumptions: 



Theorem 2.10 ([33]). (1) The set of parameters U = g ^(6e) is covered by a finite set 

a 

where Ua is given by a local coordinate, 

Ua = [0, bY = {{Ui, U2, - ■ ■ ,Ud) \ < Ui, U2,--- ,Ud< 

(2) In each Ua, 

D{g{u))=u'' = uf^...uf\ 

where fci, ■ ■ ■ , /cd are non-negative integers. 

(3) There is a positive function 7r(u) of class (J^ such that 

7r{giu))\g'{u)\=n{u)u' = n{u)u1^u'2'---u'/, (42) 

where \g'{u)\ is the absolute value of the Jacobian determinant and hi, - ■ ■ ,ha are non- 
negative integers and 

7f (n) > c> 0, 

is a function of class CP° , where c> is a positive constant. 

(4) There exist a set of functions {^^(u)} of class which satisfy 

aa{u) > 0, ^o-a(M) = 1, 

a 

(Ta{u) >0{ue [0, b)'^), supp aa{u) = [0, 6]^ 
such that, for arbitrary integrable function H{6), 

Hie)n{9)de= [ Higiu))nigiu))\g'iu)\du 



u 



V / H{g{u))r{u)uHu, 



where we define ti*{u) by omitting local coordinate a, 

7t*{u) = aa{u)n{u). 

Proof See [33]. □ 
Definition 2.9. 

n „ 

l[p{p,\ern{e)de, zl = ^^ — (43) 

Fn = ~ log = log (44) 

Zn and Fn are called a partition function and a Bayes stochastic complexity, respectively. 
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The zeta function (^{z) = J D{9)^7r(9)d9 can be analytically continued to the unique 

meromorphic function on the entire complex plane. All poles of (^{z) are real, negative, 
rational numbers. 

(—A) := maximum pole of ({z) (A > 0), (45) 
m :— multiphcity of (—A). (46) 

A and m are called the learning coefficient and its order, respectively. If D{9) and a priori 
distribution 7r{9) are represented in Theorem 2.8 and 2.9, then the learning coefficient 
and its order are given, respectively, by 

. fhj + l\ 

A = mm mm — , 47) 

a l<j<d \ 2kj J ^ ^ 

m = max#{j | A = {hj + l)/2%}, (48) 

a 

where # denotes the cardinality of the set. Let {a*} be a set of all local coordinates in 
which both the minimization in Eq.(45) and the maximization in Eq.(46) are attained. 
Such a set of local coodinates {a*} is said to be the essential family of local coordinates. 
For each local coordinate a* in the essential family of local coordinates, we assume without 
loss of generality u is represented ets u — {x, y) so that 

X = {Ui,U2, ■ ■ ■ ,Um), 

y = {Um+l,Um+2, ' ' ' ,Ud), 

and that 

A = ^ (l<J<m), 

hi + 1 , 
A<^^ (m + l<j<d). 

For any function iiiu) — H{x,y), we use the notation Ho{y) :— H{0,y). 

Theorem 2.11. (1) 

n A m — 1 
Fn - ^lognH ^loglogn 

^-^log(V7fe/ dt f t^-^e-^*+^^^^°^y^ro{y)dy) inlaw. (49) 
P Jo Ju„* J 

(2) The following asymptotic expansion holds: 

F„ = nL„ + ^ log n - log log n + F^, (50) 

1 " 

where L„ = ^^logpo(Pj)j (ind is a random variable which converges in law to a 

random variable. 
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Proof. We can prove (1) easily by using the same method as used in j33j. We define 



/3 



lb 



dt 



xA-l -/3i+/3^/^^„,o(^/)~ 



Then, (2) immediately derives from (1). 



(51) 



□ 



This theorem clarifies the behaviour of the Bayes likelihood function which evaluates 
how close a model approaches to an optimal state, according to the increase of data. 
Although there is no essential difference between this theorem and that in classical case 
[33j . there is one reason why we described this result here: The results such as Theorem 
I2.9[ I2.1H and the next Theorem 12.121 are the standard objects of interest in modern 
statistical science, and fundamental analysis in information theory is mainly based on 
large-deviation type results similar to Theorem 12.111 Therefore, this therem is of vital 
importance and need to be investigated in more details regardless of quantum or classical 
in future. 

For a given function G{6) on B, the a posteriori mean of G{6) is defined as 



G{9)l[p{p,\9)^7^{9)d9 

{G{e)K, = , 

I[p{p,\9f7r{9)d9 
where < (3 < oo. Then, the following equality holds: 



(52) 



</3 = / P {pip\OKp dmip). 



(53) 



Definition 2.10. 

(1) Bayes generalization error and Bayes generalization loss are defined, respectively, by 



bg 



log- 



Qip) 



bg 



\og{p{p\9)r: 



{pipm):,,_ 

(2) Bayes training error and Bayes training loss are defined, respectively, by 



bt 



E 



log 



Qipj 



{pipM% 



^bt = -j2[-iog{pip,\9)r:,, 



(54) 



(55) 



(3) functional variance is defined by: 



(56) 



V = $^{((logp(p,|e))^)::,-((logKp, 
i=i 

These notions are the main targets to be estimated or calculated in statistics and 
learning theory. We can easily check that 

S,, = D{qm.\9)Y:^^) = Sm<,\ (57) 
Sbg = Cbg + Ep [logg(pj)] , 



1 " 

£bt = Cbt + -Y] log q{pj). 



Our present concern is the following theorem. 
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Theorem 2.12. 



E[C^g] = E[WAIC] + o 



(5^ 



WAIC = Cu + -V. (59) 
n 

Proof. See [35]. □ 

WAIC is the acronym for "widely apphcable information criteria". It is shown by 
this theorem that the WAIC for a central measure is asymptotically equal to the Bayes 

generalization loss. Since WAIC for pq = -Jp^ is a quantum version of the information 

dm 

criteria (IC), this result can be successfully interpreted as establishing IC for quantum 
states. This also justifies our use of the central measure iiu) for the central decomposition 
of a; G E<x,: namely, owing to the use of central decomposition, our LDS in the second 
level can determine representations controlling spectra of observables, on the basis of 
numerical data of such a quantity as WAIC. It is important, not only practically but also 
conceptually, that such qualitative aspects as representations of the algebra of observables 
can be estimated by this kind of quantitative data. In addition, WAIC in quantum 
case should be contrasted with that in classical case, since the latter cannot evaluate 
representations of the algebra. On the other hand, we note that IC in the first level are 
the same as those in classical case. 

2.3.2 Physical meaning and practical use 

We can conclude that we have established the following procedures: 

Rate function =^ Predictive state ^ Information criterion "True" state). 
The procedure established in Section 2.3.1 is a typical example of this: 

Quantum relative entropy 5'(-||-) ^ 

Bayesian escort predictive state c<;"g ^ WAIC = Cu + —V "True" state ip). 

n 

First, rate functions are specified by procedures in LDP. A rate function is a barometer 
to what extent one state diverges from a "true" one. Secondly, we construct predictive 
states from models and data by applying the results of several steps whose starting point is 
the rate function provided by the first step. Thirdly, we define IC and use it for selecting 
the best predictive state from candidates. Lastly, we select one state which should be 
treated as a "true" one if) in Section 2.3.1. Taking this step, we can reach a "true" state 
by using the methods in Section 2.3.1 such as Theorem 12. 9 [ 12. IH and 12.121 As stated in 
Section 2.3.1, IC are estimators for rate functions as quasi- distances from a "true" state 
to a predictive state, which have bias terms based on the method to construct predictive 
states. 



3 Examples 

Once the sector structure consisting of mutually disjoint factor states is clarified, the 
Micro-Macro duality starts to be valid, according to which the present method of LDS 
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becomes effective. From this viewpoint, the following examples are instructive in the sense 
that the method in LDS second level enable us to reduce complicated dynamical systems 
partially to kinematics. 

(1) Non-equilibrium states in quantum field theory 

The method established in [6j is used for describing non-equiliburium states in QFT. 
The universal model of the relevant sector structure to this context is known to be provided 
by a family of factor KMS (Kubo-Martin-Schwinger) states {a;/3,^|/3 > 0, fi E K} on a von 
Neumann algebra 97t of type III parametrized by the inverse temperature /3 and by all 
other necessary thermodynamic parameters denoted collectively by /i G -ft" such as a 
chemical potential. Following the ideas in j6j, we can write a non-equilibrium state of the 
system whose reference states are {a;/3^^|/3 > 0,/i e K} on a von Neumann algebra Tl of 
type III as follows: 

(^B,p = / dp{(3,^) up^f,, (60) 



B 

where i? is a compact subset of M>o x K and p is a regular Borel measure on M>o x K. In 
this situation, we can construct a model {pg{f3,fi)\9 G 9 C M'^ : compact} of probability 
distribution, in terms of which the method of statistical inference can be systematically 
applied for the purpose of further developments of the theory of non-equilibrium states 
in QFT. 

(2) Conformal field theory and critical phenomena 

Let C be a C*-algebra generated by {e"^^", e^'~^\n G Z} such that operators {Ln} and a 
self-adjoint operator C on a Hilbert space satisfy 

"J 

[Lm, Ln] = (m - n)Lm+n H ^2 — [^"' = ^- (^^) 

Let {uc\ce Spec{C) C M}. 

ur= d(T{c) Uc, (62) 
Jr 

where cr is a regular Borel measure on Spec{C) and R is compact subset of Spec{C). In 
view of the accumulated applications of conformal field theory, it would be natural to 
expect the possibility of systematic theory for statistical estimate about critical phenom- 
ena in solid state physics on the basis of the mathematical knowledge about the reducible 
representations and states of this kind, which would be the target for future tasks. 

4 Discussion on Quantum Estimation Theory: Quan- 
tum Model Selection 

In Section 2.3.2, we have discussed estimation theory for quantum states. Remarkably, the 
methods developed here allow us to take full advantage of the usual measure-theoretical 
analysis in statistics, information theory and learning theory even for estimation of quan- 
tum states in the context of quantum theory, which is due to our bringing the use of the 
central measure /i^^ of a; G into focus. This should be contrasted with many previ- 
ous attempts in quantum estimation theory, where vain efforts have been expended for 
the attempts of formulating new notions or "quantized version" of the notions known in 
classical (measure-theoretical) statistics, information theory and learning theory. Instead, 
what is most crucial here is the difference in the method of inference according to whether 
a state to be estimated is factor or not. Since the methods discussed in Section 2.3 are for 
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non-factor states, different analysis from the one for factor states need to be built up. On 
the other hand, we have succeeded in constructing quantum model selection, which is a 
quantum version of model selection, by using measure-theoretic methods. Model selection 
began when Akaike introduced the concept of information criteria in 1971 [21 [3] to resolve 
the insufficiency of hypothesis testing for selecting the best predictive distribution. The 
best known and used one is the Akaike information criterion (AIC) 

d 



AIC = --y^p{x,\eMLE) + - (63) 



n — ' n 

where 9mle is the maximal likelihood estimator (MLE) and d is the dimension of pa- 
rameters. AIC can be applied in the situation that the maximal likelihood method, or 
the M-estimation method, is used for regular models. Furthermore, WAIC appearing in 
singular statistics [331 IM] is another version of IC and contains AIC and TIC as a special 
case. Roughly speaking, model selection is the method for selecting the predictive distri- 
bution which attains the minimum of IC in several candidates. Although AIC has been 
used for quantum states in [521 , the reason has not been clarified why we can apply 
it for quantum states. Because they applied AIC to a general positive operator-valued 
measure (POVM), not to PVM's, their use is not precisely in the first level. However, 
with the help of measuring processes and Naimark dilation, we can justify their use. Thus 
it is desirable to examine the validity of the use of IC for quantum states. 

Remark 4.1. It is occasionally said that, by using AIC, or BIG, some model with fewer 
parameters is automatically chosen. However, this statement is not precise and is no more 
than hindsight: if two models have almost equal training errors, then AIC of the model 
with fewer parameters becomes smaller than the others, and the model having the smallest 
AIC is naturally chosen. As stated in Section 2.3.2, we should fix a "true" state by using 
the predictive state and test the performance of the latter compared with other predictive 
states. Therefore, we should use flexibly the predictive state selected by IC without taking 
it by absolute priority. 

In recent years, algebraic geometry and algebraic analysis are successfully applied to 
the singular aspects in learning theory. Many statistical models, such as the normal 
mixture model 

M 

/(x|a, b, c) = ^ aji.p{x\hj, cj), (64) 

M 

where a = (ai, ■ ■ ■ , gm) such that oi, ■ ■ ■ , cia/ > and aj = 1, b = (hi, ■ ■ ■ , Bm) G , 

c = (ci, ■ ■ ■ ,cm) e (M+)^ and ip{x\b,c) = ^^^2^12 |~2c2'^^ ^ ^''^j' ^^^^ degener- 
ate Fisher information matrices, so that Riemannian-geometric methods cannot be ap- 
plied. Then the Cramer-Rao inequality 

V{B) > J-\e), (65) 

does not hold without any significance, where V{9) = {Ex[{9j{x) — 9i){9j{x) — 9j)])ij 
and J~^{9) are, respectively, the covariance matrix for 9 = (a, 6, c) and the inverse of 
the Fisher information matrix J (9). Therefore, different methods using algebraic geom- 
etry and algebraic analysis are investigated, which work efficiently in various areas, and 
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are already used in a textbook [8], which has encouraged us to use algebraic geometric 
methods. 

Lastly, we give a generalization of quantum hypothesis testing. Suppose that ip,ip & 
have central measures given, for any integrable function /, by 



f{p)dp^{p) 



with T^jOj = Ej/^i = 1, < aj,(3j < 1 (j 
the following diagonal matrices: 



m 



m), corresponding, respectively, to 



Pip 0" 



O \ 



a2 



Pip cr^ 



J 



I /3i 



/32 



O \ 



/3m / 



(66) 



Let -E^(A) (A G i3(supp p^) be the PVM corresponding to p^ (see Eq.( fT^ ). It is 
immediately seen that S{}{>\^^ = D{p^\\p^) = S{aip\\a^). We treat the state as a 
"true" one and assume that a test function of interest S"" : (supp p^)" i-> {0, 1} 5" has a 
positive operator representation An on M(m, C)*^" such that < A„ < /M(m,c)«'"- Then, 
the error probabilities of the first kind and the second kind can, respectively, be defined 
by 



Tr[ar(/-An)], 



The following theorem is a generalization of quantum Stein's lemma and can be proved 
by the same method as found in [151 EH [12], valid for the version of quantum Stein's 
theorem in these papers. 

Theorem 4.1. For any < e < 1, it holds that 



lim -log/3*(e) 



(67) 



where is the minimum second error probability under the constraint that the first 

error probability is less than e, i.e., 



/3:(e) = {/3„(A„)|yl„ G M(m,C)®",0 < A„ < /M(m,c)«", an(^n) < e}. 



(6^ 



It is important that {An} are merely operator-valued representations of tests {Sn} 
without actual uses in measurements, where E^{A) is actually used. It is obvious that 
the quantum relative entropy formulated in the present paper is accessible to actual 
experimental situations whose operational meaning is different from that in [T5l [20| [21] 
formulated in quantum i.i.d. states. 
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5 Conclusion and Perspective 



In this paper we have proposed Large Deviation Strategy and estabhshed its first and 
second levels. For this purpose, we have clarified that the quantum relative entropy 
plays the role of the rate function in LDP 2nd level, according to which several measure- 
theoretical methods work efficiently in quantum case. 

While results of this sort have been anticipated on the basis of a simple analogy to 
the classical case or direct computations in some special situations, the pertinence of such 
formal derivations has been questionable for lack of the appropriate operational setting-up 
to guarantee the appropriate interpretation. In the present case, we can safely use the 
natural relation, 5'(^/'||c(;) = D(z/||/i), due to [H] to bridge the quantum context with the 
classical one, which sweeps away all the suspicions. 

However, the situation about the estimation theory for the internal structures of a 
factor state is quite different, which seems to require some new ideas. For this purpose, 
the measurement scheme formulated in [211 |TT] would be instructive as its aim is to 
search the internal structure of a factor state. To proceed further along the present line 
of thoughts, the tasks to construct estimation theory for factor states and to establish the 
third and fourth levels in LDS will be crucially important. 
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A Barycentric Decomposition of States and Exten- 
sion of Algebra 



measure fi we have a spectral measure E^j, := (i3(supp /x) 9 A i— )• -E'^(A) := k^{xa) G ®) 
on E<^, taking values in a subalgebra 55 of the commutant 7ra;(2l)' but not in 7ri^(2l)" of 
observables. When we consider a physical process described by this spectral measure, 
it involves the object system with 7r(^(2t)"(c L°°{E<^,fi)) and the measuring one ?B(c 
T^wi^Y), the latter of which registers the indices to determine states of 21. According to 
the measurement scheme pH [TT] . we have a composite system consisting of these two 
algebras 7r^(2t)" and 23 through a suitable measurement coupling, which amounts to the 
extension of algebra: 



7r^(2l)" ^ 7r^(2l)" V «B ^ 7r^(2t)" (g)^C 7r^(2t)" V 7r^(2t)' = 3.j2t)'. (69) 

In this context, we can extend the state u E E^n to the one u on the algebra 7r^(2l)" V 53 
simply defined for A G vr^(2t)" V 53 by 




p dfi{p), of a state w of 21 by an orthogonal 



(70) 
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The naturality of this procedure is understood in relation with Tomita-Takesaki modular 
theory [5l [28], which plays vital roles in LDS 2nd level to define the quantum relative 
entropy. Here we note also that 03 is an abelian subalgebra of vr(^(2l)' and that vr(^(2t)' = 
JiTi^lQiyj in terms of the modular conjugation operator J which lives in the standard 
representation of 7rt^(2l)". The most important barycentric measures apart from central 
ones are extremal measures corresponding to a maximal abelian subalgebra of 7ra;(2l)'. 
The measure is pseudosupported by the pure states £{E%) over 21. 

Let /i be an orthogonal measure with a barycenter ip G such that there is a 
subcentral measure m satisfying /i ^ m. By the same discussion as in Section 2.3.1, we 
define for any V G Bcy{Mi{E<^)), 

Q^Hn = € r). (71) 

Then, the next theorem holds. 

Theorem A.l. Let ^ be a separable C* -algebra, ip be a state on 21, and jj, be a barycentric 
measure of tp. Then Qn"^^ satisfies LDP with the rate function D{-\\iJ,){= S{b{-)\\tlj)): 

-D(T\\fx) := - inf D{u\\fi) < liminf - log Qi^A)^^) 

<limsup-logg(f^)(r) < - inf D(z/||/i) =: -D{T\\fi) (72) 
n— >oo n i/gr 

for any V G Bcy^Mi^E^^)) . If there exists a subcentral measure m such that v.ix <^ m, 
then D{v\\^) = S{b{i')\\tlj) holds for such v belonging toY (or Y°) that D{v\[i) = D{r\\fi) 
orD{T\\^^). 

The results in Section 2.3.1 and Section 4 also hold for general barycentric measures. 
It is, however, necessary to keep in mind that barycentric decompositions in general do 
not always have clear physical meaning, in contrast to a central decomposition. To extend 
quantum estimation theory this point has to be resolved. 
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